Overview

Dataset statistics

Number of variables9
Number of observations98913
Missing cells0
Missing cells (%)0.0%
Duplicate rows91012
Duplicate rows (%)92.0%
Total size in memory11.6 MiB
Average record size in memory123.0 B

Variable types

Categorical1
Numeric8

Warnings

Dataset has 91012 (92.0%) duplicate rows Duplicates
socialNbFollowers is highly skewed (γ1 = 88.81691016) Skewed
socialNbFollows is highly skewed (γ1 = 220.8766787) Skewed
socialProductsLiked is highly skewed (γ1 = 244.1577429) Skewed
productsListed is highly skewed (γ1 = 64.89321853) Skewed
productsSold is highly skewed (γ1 = 41.59563253) Skewed
productsWished is highly skewed (γ1 = 49.25695941) Skewed
productsBought is highly skewed (γ1 = 84.79735987) Skewed
socialProductsLiked has 82987 (83.9%) zeros Zeros
productsListed has 97189 (98.3%) zeros Zeros
productsSold has 96877 (97.9%) zeros Zeros
productsPassRate has 97979 (99.1%) zeros Zeros
productsWished has 89612 (90.6%) zeros Zeros
productsBought has 93494 (94.5%) zeros Zeros

Reproduction

Analysis started2021-04-01 09:47:08.497880
Analysis finished2021-04-01 09:48:04.246175
Duration55.75 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

language
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.6 MiB
en
51564 
fr
26372 
it
7766 
de
7178 
es
6033 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters197826
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowen
2nd rowen
3rd rowfr
4th rowen
5th rowen
ValueCountFrequency (%)
en51564
52.1%
fr26372
26.7%
it7766
 
7.9%
de7178
 
7.3%
es6033
 
6.1%
2021-04-01T11:48:05.091574image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-01T11:48:05.310156image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
en51564
52.1%
fr26372
26.7%
it7766
 
7.9%
de7178
 
7.3%
es6033
 
6.1%

Most occurring characters

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter197826
100.0%

Most frequent character per category

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
Latin197826
100.0%

Most frequent character per script

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII197826
100.0%

Most frequent character per block

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

socialNbFollowers
Real number (ℝ≥0)

SKEWED

Distinct90
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.432268761
Minimum3
Maximum744
Zeros0
Zeros (%)0.0%
Memory size772.9 KiB
2021-04-01T11:48:05.746623image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile3
Q13
median3
Q33
95-th percentile5
Maximum744
Range741
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3.882383028
Coefficient of variation (CV)1.131141906
Kurtosis14415.30703
Mean3.432268761
Median Absolute Deviation (MAD)0
Skewness88.81691016
Sum339496
Variance15.07289798
MonotocityNot monotonic
2021-04-01T11:48:06.534925image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
384939
85.9%
48219
 
8.3%
52720
 
2.7%
6813
 
0.8%
7539
 
0.5%
8336
 
0.3%
9235
 
0.2%
10164
 
0.2%
11121
 
0.1%
1299
 
0.1%
Other values (80)728
 
0.7%
ValueCountFrequency (%)
384939
85.9%
48219
 
8.3%
52720
 
2.7%
6813
 
0.8%
7539
 
0.5%
8336
 
0.3%
9235
 
0.2%
10164
 
0.2%
11121
 
0.1%
1299
 
0.1%
ValueCountFrequency (%)
7441
< 0.1%
3531
< 0.1%
2051
< 0.1%
1761
< 0.1%
1721
< 0.1%
1672
< 0.1%
1471
< 0.1%
1371
< 0.1%
1311
< 0.1%
1301
< 0.1%

socialNbFollows
Real number (ℝ≥0)

SKEWED

Distinct85
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.42567711
Minimum0
Maximum13764
Zeros39
Zeros (%)< 0.1%
Memory size772.9 KiB
2021-04-01T11:48:07.565670image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile8
Q18
median8
Q38
95-th percentile8
Maximum13764
Range13764
Interquartile range (IQR)0

Descriptive statistics

Standard deviation52.83957192
Coefficient of variation (CV)6.271255262
Kurtosis52718.3891
Mean8.42567711
Median Absolute Deviation (MAD)0
Skewness220.8766787
Sum833409
Variance2792.02036
MonotocityNot monotonic
2021-04-01T11:48:08.490188image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
894893
95.9%
92386
 
2.4%
10618
 
0.6%
11260
 
0.3%
12148
 
0.1%
1394
 
0.1%
1555
 
0.1%
1453
 
0.1%
752
 
0.1%
039
 
< 0.1%
Other values (75)315
 
0.3%
ValueCountFrequency (%)
039
 
< 0.1%
15
 
< 0.1%
28
 
< 0.1%
36
 
< 0.1%
411
 
< 0.1%
511
 
< 0.1%
67
 
< 0.1%
752
 
0.1%
894893
95.9%
92386
 
2.4%
ValueCountFrequency (%)
137641
< 0.1%
82681
< 0.1%
36491
< 0.1%
20131
< 0.1%
5001
< 0.1%
4821
< 0.1%
4501
< 0.1%
4311
< 0.1%
4211
< 0.1%
2091
< 0.1%

socialProductsLiked
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct420
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.420743482
Minimum0
Maximum51671
Zeros82987
Zeros (%)83.9%
Memory size772.9 KiB
2021-04-01T11:48:08.929512image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile8
Maximum51671
Range51671
Interquartile range (IQR)0

Descriptive statistics

Standard deviation181.0305695
Coefficient of variation (CV)40.95025423
Kurtosis67765.24122
Mean4.420743482
Median Absolute Deviation (MAD)0
Skewness244.1577429
Sum437269
Variance32772.06708
MonotocityNot monotonic
2021-04-01T11:48:09.438783image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
082987
83.9%
15261
 
5.3%
21898
 
1.9%
31215
 
1.2%
4973
 
1.0%
5644
 
0.7%
6532
 
0.5%
7436
 
0.4%
8359
 
0.4%
9316
 
0.3%
Other values (410)4292
 
4.3%
ValueCountFrequency (%)
082987
83.9%
15261
 
5.3%
21898
 
1.9%
31215
 
1.2%
4973
 
1.0%
5644
 
0.7%
6532
 
0.5%
7436
 
0.4%
8359
 
0.4%
9316
 
0.3%
ValueCountFrequency (%)
516711
< 0.1%
160401
< 0.1%
70441
< 0.1%
59791
< 0.1%
55981
< 0.1%
55951
< 0.1%
51091
< 0.1%
30371
< 0.1%
29421
< 0.1%
28231
< 0.1%

productsListed
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct65
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09330421684
Minimum0
Maximum244
Zeros97189
Zeros (%)98.3%
Memory size772.9 KiB
2021-04-01T11:48:09.901726image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum244
Range244
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.050143546
Coefficient of variation (CV)21.97267835
Kurtosis5760.301256
Mean0.09330421684
Median Absolute Deviation (MAD)0
Skewness64.89321853
Sum9229
Variance4.203088557
MonotocityNot monotonic
2021-04-01T11:48:10.606874image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
097189
98.3%
1808
 
0.8%
2278
 
0.3%
3150
 
0.2%
498
 
0.1%
562
 
0.1%
645
 
< 0.1%
740
 
< 0.1%
829
 
< 0.1%
1022
 
< 0.1%
Other values (55)192
 
0.2%
ValueCountFrequency (%)
097189
98.3%
1808
 
0.8%
2278
 
0.3%
3150
 
0.2%
498
 
0.1%
562
 
0.1%
645
 
< 0.1%
740
 
< 0.1%
829
 
< 0.1%
920
 
< 0.1%
ValueCountFrequency (%)
2441
< 0.1%
2171
< 0.1%
2021
< 0.1%
1851
< 0.1%
1231
< 0.1%
1221
< 0.1%
1172
< 0.1%
1131
< 0.1%
1021
< 0.1%
961
< 0.1%

productsSold
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct75
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1215917018
Minimum0
Maximum174
Zeros96877
Zeros (%)97.9%
Memory size772.9 KiB
2021-04-01T11:48:11.049766image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum174
Range174
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.126895354
Coefficient of variation (CV)17.49210943
Kurtosis2355.673441
Mean0.1215917018
Median Absolute Deviation (MAD)0
Skewness41.59563253
Sum12027
Variance4.523683846
MonotocityDecreasing
2021-04-01T11:48:11.620783image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
096877
97.9%
1917
 
0.9%
2325
 
0.3%
3154
 
0.2%
4124
 
0.1%
658
 
0.1%
558
 
0.1%
745
 
< 0.1%
942
 
< 0.1%
831
 
< 0.1%
Other values (65)282
 
0.3%
ValueCountFrequency (%)
096877
97.9%
1917
 
0.9%
2325
 
0.3%
3154
 
0.2%
4124
 
0.1%
558
 
0.1%
658
 
0.1%
745
 
< 0.1%
831
 
< 0.1%
942
 
< 0.1%
ValueCountFrequency (%)
1741
< 0.1%
1701
< 0.1%
1631
< 0.1%
1521
< 0.1%
1251
< 0.1%
1231
< 0.1%
1081
< 0.1%
1061
< 0.1%
1041
< 0.1%
921
< 0.1%

productsPassRate
Real number (ℝ≥0)

ZEROS

Distinct72
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8123027307
Minimum0
Maximum100
Zeros97979
Zeros (%)99.1%
Memory size772.9 KiB
2021-04-01T11:48:12.085116image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum100
Range100
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.500205194
Coefficient of variation (CV)10.46433167
Kurtosis114.0391218
Mean0.8123027307
Median Absolute Deviation (MAD)0
Skewness10.66729865
Sum80347.3
Variance72.25348834
MonotocityNot monotonic
2021-04-01T11:48:12.590881image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
097979
99.1%
100441
 
0.4%
6663
 
0.1%
5057
 
0.1%
7542
 
< 0.1%
8325
 
< 0.1%
9025
 
< 0.1%
8022
 
< 0.1%
8520
 
< 0.1%
6016
 
< 0.1%
Other values (62)223
 
0.2%
ValueCountFrequency (%)
097979
99.1%
255
 
< 0.1%
282
 
< 0.1%
311
 
< 0.1%
338
 
< 0.1%
351
 
< 0.1%
372
 
< 0.1%
402
 
< 0.1%
41.61
 
< 0.1%
421
 
< 0.1%
ValueCountFrequency (%)
100441
0.4%
991
 
< 0.1%
98.71
 
< 0.1%
988
 
< 0.1%
96.41
 
< 0.1%
96.21
 
< 0.1%
965
 
< 0.1%
955
 
< 0.1%
948
 
< 0.1%
9312
 
< 0.1%

productsWished
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct279
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.562595412
Minimum0
Maximum2635
Zeros89612
Zeros (%)90.6%
Memory size772.9 KiB
2021-04-01T11:48:12.986212image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum2635
Range2635
Interquartile range (IQR)0

Descriptive statistics

Standard deviation25.19279323
Coefficient of variation (CV)16.12240317
Kurtosis3369.163069
Mean1.562595412
Median Absolute Deviation (MAD)0
Skewness49.25695941
Sum154561
Variance634.6768308
MonotocityNot monotonic
2021-04-01T11:48:13.405933image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
089612
90.6%
13375
 
3.4%
21339
 
1.4%
3797
 
0.8%
4526
 
0.5%
5406
 
0.4%
6299
 
0.3%
7252
 
0.3%
8176
 
0.2%
9158
 
0.2%
Other values (269)1973
 
2.0%
ValueCountFrequency (%)
089612
90.6%
13375
 
3.4%
21339
 
1.4%
3797
 
0.8%
4526
 
0.5%
5406
 
0.4%
6299
 
0.3%
7252
 
0.3%
8176
 
0.2%
9158
 
0.2%
ValueCountFrequency (%)
26351
< 0.1%
19161
< 0.1%
19001
< 0.1%
18421
< 0.1%
18201
< 0.1%
17831
< 0.1%
16221
< 0.1%
12951
< 0.1%
12251
< 0.1%
11131
< 0.1%

productsBought
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct70
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1719288668
Minimum0
Maximum405
Zeros93494
Zeros (%)94.5%
Memory size772.9 KiB
2021-04-01T11:48:13.981333image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum405
Range405
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.332265666
Coefficient of variation (CV)13.56529424
Kurtosis11871.75975
Mean0.1719288668
Median Absolute Deviation (MAD)0
Skewness84.79735987
Sum17006
Variance5.439463136
MonotocityNot monotonic
2021-04-01T11:48:14.418821image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
093494
94.5%
13297
 
3.3%
2845
 
0.9%
3364
 
0.4%
4214
 
0.2%
5139
 
0.1%
6108
 
0.1%
765
 
0.1%
852
 
0.1%
940
 
< 0.1%
Other values (60)295
 
0.3%
ValueCountFrequency (%)
093494
94.5%
13297
 
3.3%
2845
 
0.9%
3364
 
0.4%
4214
 
0.2%
5139
 
0.1%
6108
 
0.1%
765
 
0.1%
852
 
0.1%
940
 
< 0.1%
ValueCountFrequency (%)
4051
< 0.1%
2791
< 0.1%
1741
< 0.1%
1151
< 0.1%
1051
< 0.1%
931
< 0.1%
871
< 0.1%
851
< 0.1%
811
< 0.1%
801
< 0.1%

Interactions

2021-04-01T11:47:30.679035image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:32.542101image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:33.446816image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:34.693858image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:36.471727image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:37.112031image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:37.610826image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:38.434836image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:39.251724image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:40.248873image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:40.697934image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:41.148609image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:41.489797image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:41.855793image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:42.364136image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:42.760700image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:43.166026image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:43.549787image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:44.032959image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:44.530268image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:44.928706image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:45.287498image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:45.628297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:46.226715image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:46.600445image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:47.037251image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:47.496948image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:48.573782image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:49.125597image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:49.787700image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:50.302962image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:50.723063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:51.384530image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:51.794240image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:52.476722image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:52.851158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:53.289384image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:53.758067image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:54.075860image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:54.382891image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:54.699996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:55.242638image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:55.601551image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:56.015755image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:56.742641image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:57.171848image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:57.574297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:57.920914image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:58.258030image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:58.603095image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:59.034067image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:59.420530image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:47:59.788843image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:48:00.172050image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:48:00.570432image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T11:48:00.938370image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-04-01T11:48:14.826288image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-01T11:48:15.749284image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-01T11:48:16.222452image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-01T11:48:16.696529image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-04-01T11:48:01.599800image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-01T11:48:02.583500image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

languagesocialNbFollowerssocialNbFollowssocialProductsLikedproductsListedproductsSoldproductsPassRateproductsWishedproductsBought
0en14710772617474.01041
1en167821917099.000
2fr13713603316394.0103
3en131101412215292.070
4en1678025125100.000
5de1301214712391.000
6en121011403110894.0531105
7fr5393510698.000
8it7441376451671010485.018420
9en578451239274.062

Last rows

languagesocialNbFollowerssocialNbFollowssocialProductsLikedproductsListedproductsSoldproductsPassRateproductsWishedproductsBought
98903es380000.000
98904en380000.000
98905en386000.000
98906en380000.000
98907en380000.000
98908fr380000.000
98909fr380000.000
98910en380000.000
98911it380000.000
98912fr380000.000

Duplicate rows

Most frequent

languagesocialNbFollowerssocialNbFollowssocialProductsLikedproductsListedproductsSoldproductsPassRateproductsWishedproductsBoughtcount
144en380000.00036601
817fr380000.00018896
1135it380000.0005266
0de380000.0004690
734es380000.0004527
488en480000.0002576
190en381000.0001552
998fr480000.0001534
856fr381000.000739
151en380000.010649